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Abstract 

Severe acute respiratory syndrome (SARS) is a novel human illness caused by a previously unrecognized coronavirus (CoV) 
termed SARS-CoV. There are conflicting reports on the animal reservoir of SARS-CoY. Many of the groups that argue carnivores 
are the original reservoir of SARS-CoV use a phylogeny to support their argument. However, the phylogenies in these studies often 
lack outgroup and rooting criteria necessary to determine the origins of SARS-CoV. Recently, SARS-CoV has been isolated from 
various species of Chiroptera from China (e.g., Rhinolophus sinicus) thus leading to reconsideration of the original reservoir of 
SARS-CoV. We evaluated the hypothesis that SARS-CoV isolated from Chiroptera are the original zoonotic source for SARS-CoV 
by sampling SARS-CoV and non-SARS-CoV from diverse hosts including Chiroptera, as well as carnivores, artiodactyls, rodents, 
birds and humans. Regardless of alignment parameters, optimality criteria, or isolate sampling, the resulting phylogenies clearly 
show that the SARS-CoV was transmitted to small carnivores well after the epidemic of SARS in humans that began in late 2002. 
The SARS-CoV isolates from small carnivores in Shenzhen markets form a terminal clade that emerged recently from within the 
radiation of human SARS-CoV. There is evidence of subsequent exchange of SARS-CoV between humans and carnivores. In 
addition SARS-CoV was transmitted independently from humans to farmed pigs (Sus scrofa). The position of SARS-CoV isolates 
from Chiroptera are basal to the SARS-CoV clade isolated from humans and carnivores. Although sequence data indicate that 
Chiroptera are a good candidate for the original reservoir of SARS-CoV, the structural biology of the spike protein of SARS-CoV 
isolated from Chiroptera suggests that these viruses are not able to interact with the human variant of the receptor of SARS-CoV, 
angiotensin-converting enzyme 2 (ACE2). In SARS-CoV we study, both visually and statistically, labile genomic fragments and, 
putative key mutations of the spike protein that may be associated with host shifts. We display host shifts and candidate mutations 
on trees projected in virtual globes depicting the spread of SARS-CoV. These results suggest that more sampling of coronaviruses 
from diverse hosts, especially Chiroptera, carnivores and primates, will be required to understand the genomic and biochemical 
evolution of coronaviruses, including SARS-CoV. 

© The Willi Hennig Society 2008. 


Severe acute respiratory syndrome (SARS) is a 
recently described human infectious disease caused by 
a previously unrecognized coronavirus, SARS-CoV 
(Ksiazek et al., 2003). Between November 2002 and 
August 2003, there were 8422 cases and 916 deaths from 
SARS (WHO, 2003). These numbers are not on the scale 
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of major epidemics such as seasonal forms of influenza 
infecting humans, but in an era of rapid globalization, 
the potential for a pandemic was significant. SARS-CoV 
infection has not been reported among humans since the 
early days of 2004. However, there remain conflicting 
reports on the animal reservoir of SARS-CoV. Guan 
et al. (2003) and Kan et al. (2005) implicate small 
carnivores whereas Li et al. (2005) and Lau et al. 
(2005) asserted that Chiroptera are the animal reservoir 
of SARS-CoV. In a comprehensive review of CoV 
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among Chiroptera, Tang et al. (2006) argued that the 
origin of SARS-CoV remains unknown. 

Among humans, serological surveys indicate that 
SARS-CoV viruses were circulating in subepidemic levels 
in 2001 in residents of Hong Kong (data from mainland 
China is not available) (Zheng et al., 2004). Also, in 
describing the world’s largest SARS epidemic in Beijing, 
Pang et al. (2003) point out that “It is possible that some 
SARS cases were not counted before mid-April 2003 
when the extent of the outbreak was fully recognized.” 

In a search for the animal reservoir of SARS-CoV 
outside of urban areas Kan et al. (2005) surveyed 
farmed Parguma larvata (Himalayan palm civet) in 25 
farms spread over 12 provinces in South-east China and 
found no evidence of SARS-CoV infection. SARS-CoV 
in carnivores was isolated to animals in the Xinyuan 
market, in the suburbs of Guangzhou, China. 
Vijaykrishna et al. (2007) make the argument that 
Chiroptera are a reservoir for a wide variety of 
coronaviruses (SARS and non-SARS) that affect 
humans and animals. Before the SARS outbreak, 
coronaviruses were known primarily from animals of 
agricultural importance in which they cause respiratory 
and enteric infections (Siddell et al., 1983). The human 
strains CoV-229E and CoV-OC43, which are distantly 
related to SARS-CoV, cause mild respiratory illnesses 
similar to the common cold (Mahony and Richardson, 
2005). Recently Dominguez et al. (2007) have shown 
that Chiroptera ( Myotis occultus and Eptesicus fuscus 
from the Rocky Mountains of Colorado, USA, carry 
group 1 coronaviruses. Our preliminary analyses show 
that these Co Vs from Rocky Mountain Chiroptera are 
very closely related to group 1 CoV that infect humans 
(e.g., CoV-229E and CoV-OC43). 

Genomic sequence data 

The genome of a coronavirus is comprised of a 
single-stranded, positive-sensed RNA molecule 27-31 
kilobases in length (Lai, 1990). Before the SARS-CoV 
outbreak coronavirus diversity was poorly docu¬ 
mented, especially at the genomic level. However, 
coronavirus research has been invigorated since the 
sequencing of the first SARS-CoV isolate (Marra 
et al., 2003; Rota et al., 2003). For example, in the 
wake of SARS, two novel human coronaviruses were 
found [HKU1, GenBank (http://www.ncbi.nlm.nih.- 
gov) accession AY597011 (Woo et al., 2005); and 
NL63, GenBank accession NC_005831 (van der Hoek 
et al., 2004)]. Also notable are the release of new 
genomic sequences for SARS-CoV among carnivores, 
artiodactyls, humans and Chiroptera (Guan et al., 
2003; Chinese SARS Molecular Epidemiology Con¬ 
sortium, 2004; Tu et al., 2004; Chen et al., 2005; Lau 
et al., 2005; Li et al., 2005; Tang et al., 2006). 


Guan et al. (2003) sequenced several partial and 
complete genomes from SARS-CoV isolated in 2003 
from two small carnivore hosts Parguma larvata and 
Nyctereutes procyonoides (raccoon dog) that were for 
sale in live animal markets in Shenzhen, Guangdong 
Province, China. Complete and partial genomes of the 
coronaviruses isolated from P. larvata [SARS-CoV SZ1, 
SZ16, SZ3; GenBank accessions AY304489, AY304488 
and AY304486] and Nyctereutes procyonoides (SARS- 
CoV SZ13; GenBank accession AY304487) became 
available publically in September 2003 but were updated 
in November 2003. A complete genome of a SARS-CoV 
isolated from P. larvata host was released in January, 
2005 (SARS-CoV HC/SZ/61/03; GenBank accession 
AY515512). A complete genome of SARS-CoV isolated 
from Melogale moschata , the Chinese ferret badger, was 
released in March, 2005 (SARS coronavirus 
CFB/SZ/94/03; GenBank accession AY545919). 

Several, but not all of the genomes of the coronav¬ 
iruses isolated from small carnivores contain a specific 
29-nucleotide region (CCTACTGGTTACCAA- 
CCTGAATGGAATAT, e.g., positions 27869-27897 
in the of AY304488) in a protein with an unknown 
function. It was initially reported that this 29-nucleotide 
region was absent from all human SARS-CoV isolates 
sequenced with the notable exception of one isolate from 
Guangdong that contains the 29-nucleotide region 
(GD01 GenBank accession AY278489) (Guan et al., 
2003); however, several human isolates were later 
discovered to contain the region. Owing to the perceived 
potential of the 29-nucleotide region as a clue to the 
animal origins and subsequent adaptation of SARS- 
CoV to human hosts, this 29-nucleotide region garnered 
media attention as early as May 2003 as a “29- 
nucleotide deletion” in human SARS-CoV that enabled 
animal to human transmission (Bradsher and Altman, 
2003; Enserink, 2003). 

SARS-CoV isolates from Chiroptera contain a differ¬ 
ent 29-nucleotide sequence (CCA AT AC ATT ACT ATT- 
CGGACTGGTTTAT, e.g., positions 27866-27894 in 
DQ648857, Bat coronavirus BtCoV/279/2005) in a 
protein with an unknown function. This fragment from 
isolates of SARS-CoV derived from Chiroptera is in an 
orthologous genomic position to the 29-nucleotide 
region described above for some SARS-CoV isolated 
from small carnivores and humans. When the 29- 
nucleotide regions from Chiroptera versus human and 
carnivore hosts are compared, 12 nucleotide positions 
are polymorphic (Lau et al., 2005). Under the current 
sampling of SARS-CoV, this fragment is exclusive to 
SARS-CoV isolated from Chiroptera. 

The Chinese SARS Molecular Epidemiology Consor¬ 
tium (2004) published an analysis of molecular evolution 
of SARS-CoV within humans during the 2002-03 
epidemic. This study included the release of many new 
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genomic sequences of SARS-CoV from humans infected 
in the early stages of the outbreak in southern China 1 . 

A human SARS-CoV associated with a re-emergent 
case of SARS in Guangzhou, Guangdong Province, 
China was isolated December 22, 2003. The sequence of 
this SARS-CoV spike gene was released in February 
2004 (SARS-CoV GD03T0013; GenBank accession 
AY525636). 

Song et al. (2005) released many full and partial 
genome sequences of SARS-CoV isolated from human 
and palm civet cats collected in southern China into the 
public domain in 2005“. Kan et al. (2005) released many 
spike gene and three full genome sequences for SARS- 
CoV isolated from human, raccoon dog and civet cat 
hosts into the public domain in July, 2006 . 

Li et al. (2005) 4 published SARS-CoV nucleoprotein 
and spike gene sequences (some recently updated as 
whole genomes) isolated from Chiroptera: Rhinolophus 


1 GenBank accession numbers for SARS-CoV sequences released in 
January 2004: AY394978 AY394979 AY394980 AY394981 AY394982 
AY394983 AY394984 AY394985 AY394986 AY394987 AY394989 
AY394990 AY394991 AY394992 AY394993 AY394994 AY394995 
AY394996 AY394997 AY394999 AY395000 AY395001 AY395002 
AY395003 AY395004. 

'■y 

GenBank accession numbers for SARS-CoV sequences released in 
2005: AY313906 AY338174 AY338175 AY348314 AY394850 
AY461660 AY485277 AY485278 AY525636 AY568539 AY613947 
AY613948 AY613949 AY613950 AY613951 AY613952 AY613953 
AY627044 AY627045 AY627046 AY627047 AY627048 

3 AY687354 AY687357 AY687358AY687361 AY687365 AY687370 
AY686863 AY572034 AY687372 AY687362 AY686864 AY687364 
AY687367 AY572038 AY304486 AY687363 AY687355 AY687369 
AY687366 AY687371 AY525636 AY687359 note erratum published 
to correct accession numbers and SNPs (Kan et al. (2005) 

4 GenBank accession numbers for SARS-CoV sequences released as 
nucleocapsid sequences in January 2006 and then as whole genomes in 
June 2006: DQ071611, DQ071612. Whole genomes released in January 
2006: DQ071615. Nucleocapsid sequences released in January 2006: 
DQ071613, DQ071614. Spike sequences released in November 2005 
revised in July 2006: DQ159956, DQ159957. 

5 GenBank accession numbers for whole genomes released in 
September 2005 and later updated in October 2005: DQ022305, 
DQ084199, DQ084200. 

6 GenBank accession numbers for RNA-dependent RNA polymer¬ 
ase, polyprotein gene and spike gene: AY864196, AY864197, 
AY864198. 

7 GenBank accessions for genomes DQ648794, DQ648856, 
DQ648857, various genes DQ648786 DQ648786 DQ648787 

DQ648788 DQ648789 DQ648790 DQ648791 DQ648792 DQ648793 
DQ648795 DQ648796 DQ648797 DQ648799 DQ648800 DQ648801 
DQ648802 DQ648803 DQ648804 DQ648805 DQ648806 DQ648807 
DQ648808 DQ648809 DQ648810 DQ648811 DQ648812 DQ648813 
DQ648814 DQ648815 DQ648816 DQ648817 DQ648818 DQ648819 
DQ648820 DQ648821 DQ648822 DQ648823 DQ648824 DQ648825 
DQ648826 DQ648827 DQ648828 DQ648829 DQ648830 DQ648831 
DQ648832 DQ648833 DQ648834 DQ648835 DQ648836 DQ648837 
DQ648838 DQ648839 DQ648840 DQ648841 DQ648842 DQ648843 
DQ648844 DQ648845 DQ648846 DQ648847 DQ648848 DQ648849 
DQ648850 DQ648851 DQ648852 DQ648853 DQ648854 DQ648855 
DQ648858. 


sinicus , Rhinolophus ferrumequinum , Rhinolophus macro- 
tis and Rhinolophus pearsoni. Lau et al. (2005) 5 pub¬ 
lished three complete SARS-CoV genomes isolated from 
the bat Rhinolophus pearsoni and a SARS-CoV poly¬ 
merase sequences from Rhinolophus sinicus. Poon et al. 
(2005) 6 published sequences of RNA-dependent RNA 
polymerase (RdRp), polyprotein, and spike genes of a 
non-SARS-CoV isolated from the bat Miniopterous 
pusillus. Tang et al. (2006) published a review of bat 
coronaviruses in August, 2006 and released three 
genomes and 70 gene fragments in July, 2006. 

Receptor binding studies 

Li et al. (2006) provide a review of the structural 
biology of the SARS-CoV spike protein and the 
variation of the receptor for spike protein on host cells, 
angiotensin-converting enzyme 2 (ACE2), among hu¬ 
man and carnivore hosts. These authors point out via 
pairwise alignment that the spike protein of SARS-CoV 
isolated from Chiroptera lack a stretch of amino acid 
residues and have mismatches among other residues that 
form the receptor-binding motif for the human variant 
of ACE2. 

There is also empirical evidence concerning the 
relative affinity of various spike proteins to ACE2 from 
various hosts. The SARS-CoV spike proteins tested 
include: an early epidemic, 2002-03, human isolate 
(SARS-CoV, TOR 2), a human isolate tied to sporadic 
infections in 2003-04 (SARS-CoV, GD03T0013), and a 
carnivore isolate ( P . larvata , SZ3) from 2003 to 2003 (Li 
et al., 2005). Li et al. (2005, 2006) describe and 
“expected” result for SZ3 and an “unexpected” result 
for GD03T0013 that both of these spike proteins bound 
P. larvata ACE2 better than they bound human ACE2. 
Spike protein from TOR 2 bound ACE2 from P. larvata 
and human equally well. The unexpected nature of their 
results is tied to the perception that the SARS-CoV virus 
was adapting from carnivore to humans as suggested by 
prevailing phylogenetic studies of the time (e.g., Guan 
et al., 2003; Chinese SARS Molecular Epidemiology 
Consortium, 2004; Kan et al., 2005; Song et al., 2005). 

Methods 

Demarcation of sequence characters 

We compared nucleotide sequences for whole and 
partially sequenced genomes that were in the public 
domain as of January 1, 2005. This data set included 83 
viruses from a wide host and geographic range 
(Table 1). First, we compared these genomes with 
ClustalW under default settings (i.e., gap opening 
penalty 15 gap extension penalty 6.66, DNA transition 
weight 0.5) (Thompson et al., 1994) and developed a set 
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Table 1 

GenBank accession numbers and descriptions of genomes and partial 
genomes of virus exemplars considered in the 83 isolate data set 


GenBank accession no. 

Name of vims 

AF124986 

Canine coronavirus 

AF124987 

Feline infectious peritonitis virus 

AF124988 

Porcine hemagglutinating 
encephalomyelitis virus 

AF124989 

Human coronavirus OC43 

AF124990 

Rat sialodacryoadenitis coronavirus 

AF124991 

Turkey coronavirus 

AF201929 

Murine hepatitis strain 2 

AF207902 

Murine hepatitis virus ML 11 

AF208066 

Murine hepatitis virus Penn 971 

AF208067 

Murine hepatitis virus ML 10 

AF220295 

Bovine coronavirus Quebec 

AF304460 

Human coronavirus 229E 

AF391542 

Bovine coronavirus LUN 

AJ271965 

Transmissible gastroenteritis virus 

AY278487 

SARS coronavirus BJ02 

AY278488 

SARS coronavirus BJ01 

AY278489 

SARS coronavirus GD01 

AY278490 

SARS coronavirus BJ03 

AY278491 

SARS coronavirus HKU39849 

AY278554 

SARS coronavirus CUHK Wl 

AY278741 

SARS coronavirus Urbani 

AY279354 

SARS coronavirus BJ04 

AY282752 

SARS coronavirus CUHK SulO 

AY283794 

SARS coronavirus SIN 2500 

AY283795 

SARS coronavirus SIN 2677 

AY283796 

SARS coronavirus SIN 2679 

AY283797 

SARS coronavirus SIN 2748 

AY283798 

SARS coronavirus SIN 2774 

AY291315 

SARS coronavirus Frankfurtl 

AY291451 

SARS coronavirus TW1 

AY297028 

SARS coronavirus ZJ01 

AY304486 

SARS coronavirus SZ3 civet cat 

AY304487 

SARS coronavirus SZ13 civet cat 

AY304488 

SARS coronavirus SZ16 civet cat 

AY304489 

SARS coronavirus SZ1 raccoon dog 

AY304490 

SARS coronavirus GZ43 

A Y304491 

SARS coronavirus GZ60 

AY304492 

SARS coronavirus HKU 36871 

AY304493 

SARS coronavirus HKU 65806 

AY304494 

SARS coronavirus HKU 66078 

AY304495 

SARS coronavirus GZ50 

AY313906 

SARS coronavirus GD69 

AY321118 

SARS coronavirus TWC 

AY323977 

SARS coronavirus HSR1 

AY345986 

SARS coronavirus CUHK AG01 

AY345987 

SARS coronavirus CUHK AG02 

AY390556 

SARS coronavirus GZ02 

AY394978 

SARS coronavirus GZ B 

AY394979 

SARS coronavirus GZ C 

AY394980 

SARS coronavirus GZ D 

AY394981 

SARS coronavirus HGZ8L1 A 

AY394982 

SARS coronavirus HGZ8L1 B 

AY394983 

SARS coronavirus HSZ2 A 

AY394984 

SARS coronavirus HSZ A 

AY394985 

SARS coronavirus HSZ Bb 

AY394986 

SARS coronavirus HSZ Cb 

AY394987 

SARS coronavirus HZS2 Fb 

AY394989 

SARS coronavirus HZS2 D 

AY394990 

SARS coronavirus HZS2 E 

AY394991 

SARS coronavirus HZS2 Fc 

AY394992 

SARS coronavirus HZS2 C 


Table 1 

( Continued) 


GenBank accession no. 

Name of virus 

AY394993 

SARS coronavirus HGZ8L2 

AY394994 

SARS coronavirus HSZ Be 

AY394995 

SARS coronavirus HSZ Cc 

AY394996 

SARS coronavirus ZS B 

AY394997 

SARS coronavirus ZS A 

AY394999 

SARS coronavirus LC2 

AY395000 

SARS coronavirus LC3 

AY395001 

SARS coronavirus LC4 

AY395002 

SARS coronavirus LC5 

AY395003 

SARS coronavirus ZS C 

AY395004 

SARS coronavirus HZS2 Bb 

AY515512 

SARS coronavirus HC SZ 61 03 
civet cat 

AY525636 

SARS coronavirus GD03T0013 

AY567487 

Human Coronavirus NL63 

AY654624 

SARS coronavirus TJF pig 

BCU00735 

Bovine coronavirus Mebus 

NC 001451 

Avian infectious bronchitis virus 

NC 001846 

Murine hepatitis virus MHVA59 

NC 003045 

Bovine coronavirus 

NC 003436 

Porcine epidemic diarrhea virus 

NC 004718 

SARS coronavirus TOR2 

NC 005147 

Human coronavirus OC43 NL 


of fragment boundaries that accommodated both 
sequence similarity and unequal sequencing coverage. 
We then split the genomes along these boundaries and 
remove all gaps inserted by ClustalW, thus forming 62 
sequence fragment characters for POY3 (Wheeler et al., 
2006). 

We use the same ClustalW settings to produce an 
updated aligned data set of whole and partially 
sequenced genomes that were in the public domain as 
of July 21, 2006. The updated data set includes 157 
viruses many of which were isolated from Chiroptera 
and small carnivore hosts (Table 2). We then split the 
genomes along 66 boundaries and removed all gaps 
inserted by ClustalW, thus forming an updated set of 67 
sequence fragment characters for POY3. 

We produced a data set of 113 whole genomes of S ARS- 
CoV from human, Chiroptera, swine and carnivore hosts 
(Table 3) that were available to the public as of July 21, 
2006. We used a single outgroup, human coronavirus 
NL63 (GenBank accession no. AY567487). The 
sequences in this data set were similar enough to align 
without splitting them into sequence fragment characters. 
Together these 114 complete genome sequences were 
aligned using default settings in ClustalW. This align¬ 
ment was analyzed with standard tree search methods. 

Sensitivity analysis plus tree fusion under direct optimi¬ 
zation 

Direct optimization (Wheeler, 1996) works by creat¬ 
ing parsimonious hypothetical ancestral sequences at 
internal nodes of a cladogram. The key difference 
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Table 2 


Table 2 


GenBank accession numbers and descriptions of genomes and partial 

( Continued) 


genomes of virus exemplars considered in the 157 isolate data set 





GenBank accession no. 

Name of virus 

GenBank accession no. 

Name of virus 





AY394978 

SARS coronavirus GZ B 

AF124986 

Canine coronavirus 

AY394979 

SARS coronavirus GZ C 

AF124987 

Feline infectious peritonitis 

AY394980 

SARS coronavirus GZ D 

AF124988 

Porcine hemagglutinating encep 

AY394981 

SARS coronavirus HGZ8L1 A 

AF124989 

Fluman coronavirus strain OC43 

AY394982 

SARS coronavirus HGZ8L1 B 

AF124990 

Rat sialodacryoadenitis CoV 

AY394983 

SARS coronavirus HSZ2 A 

AF124991 

Turkey coronavirus 

AY394984 

SARS coronavirus HSZ A 

AF201929 

Murine hepatitis 2 

AY394985 

SARS coronavirus HSZ Bb 

AF207902 

Murine hepatitis ML 11 

AY394986 

SARS coronavirus HSZ Cb 

AF208066 

Murine hepatitis Penn 97 1 

AY394987 

SARS coronavirus HZS2 Fb 

AF208067 

Murine hepatitis ML 10 

AY394988 

SARS coronavirus JMD 

AF220295 

Bovine coronavirus Quebec 

AY394989 

SARS coronavirus HZS2 D 

AF304460 

Human coronavirus 229E 

AY394990 

SARS coronavirus HZS2 E 

AF391542 

Bovine CoV LUN 

AY394991 

SARS coronavirus HZS2 Fc 

AJ271965 

Transmissible gastroenteritis 

AY394992 

SARS coronavirus HZS2 C 

AP006557 

SARS coronavirus TWH 

AY394993 

SARS coronavirus HGZ8L2 

AP006558 

SARS coronavirus TWJ 

AY394994 

SARS coronavirus HSZ Be 

AP006559 

SARS coronavirus TWK 

AY394995 

SARS coronavirus HSZ Cc 

AP006560 

SARS coronavirus TWS 

AY394996 

SARS coronavirus ZS B 

AP006561 

SARS coronavirus TWY 

AY394997 

SARS coronavirus ZS A 

AY278487 

SARS coronavirus BJ02 

AY394998 

SARS coronavirus LC1 

AY278488 

SARS coronavirus BJ01 

AY394999 

SARS coronavirus LC2 

AY278489 

SARS coronavirus GD01 

AY395000 

SARS coronavirus LC3 

AY278490 

SARS coronavirus BJ03 

AY395001 

SARS coronavirus LC4 

AY278491 

SARS coronavirus HKU 39849 

AY395002 

SARS coronavirus LC5 

AY278554 

SARS coronavirus CUHK W1 

AY395003 

SARS coronavirus ZS C 

AY278741 

SARS coronavirus Urbani 

AY395004 

SARS coronavirus HZS2 Bb 

AY279354 

SARS coronavirus BJ04 

AY427439 

SARS coronavirus AS 

AY282752 

SARS coronavirus CUHK SulO 

AY461660 

SARS coronavirus SoD 

AY283794 

SARS coronavirus Sin2500 

AY463059 

SARS coronavirus Shanghai QXC1 

AY283795 

SARS coronavirus Sin2677 

AY485277 

SARS coronavirus Sinol 11 

AY283796 

SARS coronavirus Sin2679 

AY485278 

SARS coronavirus Sino3 11 

AY283797 

SARS coronavirus Sin2748 

AY502923 

SARS coronavirus TW10 

AY283798 

SARS coronavirus Sin2774 

AY502924 

SARS coronavirus TW11 

AY291315 

SARS coronavirus Frankfurt 1 

AY502925 

SARS coronavirus TW2 

AY291451 

SARS coronavirus TW1 

AY502926 

SARS coronavirus TW3 

AY297028 

SARS coronavirus ZJ01 

AY502927 

SARS coronavirus TW4 

AY304486 

SARS coronavirus SZ3 

AY502928 

SARS coronavirus TW5 

AY304487 

SARS coronavirus SZ13 

AY502929 

SARS coronavirus TW6 

AY304488 

SARS coronavirus SZ16 

AY502930 

SARS coronavirus TW7 

AY304489 

SARS coronavirus SZ1 

AY502931 

SARS coronavirus TW8 

AY304490 

SARS coronavirus GZ43 

AY502932 

SARS coronavirus TW9 

AY304491 

SARS coronavirus GZ60 

AY508724 

SARS coronavirus NS 1 

AY304492 

SARS coronavirus HKU 36871 

AY515512 

SARS coronavirus HC SZ 61 03 

AY304493 

SARS coronavirus HKU 65806 

AY525636 

SARS coronavirus GD03T0013 

AY304494 

SARS coronavirus HKU 66078 

AY545914 

SARS coronavirus HC SZ 79 03 

AY304495 

SARS coronavirus GZ50 

AY545915 

SARS coronavirus HC SZ DM1 03 

AY310120 

SARS coronavirus FRA 

AY545916 

SARS coronavirus HC SZ 266 03 

AY313906 

SARS coronavirus GD69 

AY545917 

SARS coronavirus HC GZ 81 03 

AY321118 

SARS coronavirus TWC 

AY545918 

SARS coronavirus HC GZ 32 03 

AY323977 

SARS coronavirus HSR 

AY545919 

SARS coronavirus CFB SZ 94 03 

AY338174 

SARS coronavirus Taiwan TCI 

AY559082 

SARS coronavirus Sin852 

AY338175 

SARS coronavirus Taiwan TC2 

AY559084 

SARS coronavirus Sin3765V 

AY345986 

SARS coronavirus CUHK AG01 

AY559085 

SARS coronavirus Sin848 

AY345987 

SARS coronavirus CUHK AG02 

AY559086 

SARS coronavirus Sin849 

AY345988 

SARS coronavirus CUHK AG03 

AY559093 

SARS coronavirus Sin845 

AY348314 

SARS coronavirus Taiwan TC3 

AY559095 

SARS coronavirus Sin847 

AY350750 

SARS coronavirus PUMC01 

AY559096 

SARS coronavirus Sin850 

AY357075 

SARS coronavirus PUMC02 

AY567487 

Human Coronavirus NL63 

AY357076 

SARS coronavirus PUMC03 

AY568539 

SARS coronavirus GZ0401 

AY390556 

SARS coronavirus GZ02 

AY572034 

SARS coronavirus civet007 

AY394850 

SARS coronavirus WHU 

AY572035 

SARS coronavirus civetOlO 

AY394977 

SARS coronavirus GZ A 
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Table 2 
( Continued) 


GenBank accession no. 

Name of virus 

AY572038 

SARS coronavirus civet020 

AY613947 

SARS coronavirus GZ0402 

AY613948 

SARS coronavirus PC4-13 

AY613949 

SARS coronavirus PC4-136 

AY613950 

SARS coronavirus PC4-227 

AY613951 

SARS coronavirus PC4-127 

AY613952 

SARS coronavirus PC4-205 

AY613953 

SARS coronavirus GZ0403 

AY627044 

SARS coronavirus PC4-115 

AY627045 

SARS coronavirus PC4-137 

AY627046 

SARS coronavirus PC4-145 

AY627047 

SARS coronavirus PC4-199 

AY627048 

SARS coronavirus PC4-241 

AY654624 

SARS coronavirus TJF 

AY686863 

SARS coronavirus A022 

AY686864 

SARS coronavirus B039 

AY864197 

Bat coronavirus strain 61 

BCU00735 

Bovine coronavirus Mebus 

DQ022305 

Bat SARS coronavirus HKU3 1 

DQ071613 

Bat SARS coronavirus Rpl 

DQ071614 

Bat SARS coronavirus Rp2 

DQ071615 

Bat SARS coronavirus Rp3 

DQ084199 

Bat SARS coronavirus HKU3 2 

DQ084200 

Bat SARS coronavirus HKU3 3 

DQ412042 

Bat SARS coronavirus Rfl 

DQ412043 

Bat SARS coronavirus Rml 

DQ648857 

Bat coronavirus BtCoV 279 2005 

NC 001451 

Avian infectious bronchitis 

NC 001846 

Murine hepatitis virus 

NC 003045 

Bovine coronavirus 

NC 003436 

Porcine epidemic diarrhea virus 

NC 004718 

SARS coronavirus Toronto 2 

NC 005147 

Human coronavirus OC43 


between direct optimization and multiple alignment is 
that in direct optimization evolutionary differences in 
sequence length are accommodated, not by the use of 
gap characters, but rather by allowing insertion-deletion 
events between ancestral and descendant sequences. In 
direct optimization, evolutionary base substitution and 
insertion-deletion events are treated with the same edit 
costs that are used in standard studies using static 
alignment followed by search for a set of optimal tree(s). 
However, in direct optimization, alignment is dynamic 
in that a novel set of putative sequence homologies is 
considered each time a novel topology is considered. 
The best set(s) of homologies is discovered by searching 
for the topology(ies) that minimizes the global cost of 
substitution and indel events. 

Moreover, we varied alignment parameter sets across 
five sets of edit costs ranging from unitary costs 
for nucleotide insertion-deletions, transversions and 
transitions to costs with upweighted insertion-deletions 
and transversions (Tables 4 and 5) (Wheeler, 1995). This 
process of parallel direct optimization across many edit 
costs not only allows for analysis of whether the results 
are sensitive to parameter choice, but when also coupled 


Table 3 

GenBank accession numbers and descriptions of whole genomes of 
vims exemplars considered in the 114 isolate data set 


AP006557 

SARS 

coronavirus 

TWH 

AP006558 

SARS 

coronavirus 

TWJ 

AP006559 

SARS 

coronavirus 

TWK 

AP006560 

SARS 

coronavirus 

TWS 

AP006561 

SARS 

coronavirus 

TWY 

AY278487 

SARS 

coronavirus 

BJ02 

AY278488 

SARS 

coronavirus 

BJ01 

AY278489 

SARS 

coronavirus 

GD01 

AY278490 

SARS 

coronavirus 

BJ03 

AY278491 

SARS 

coronavirus 

HKU 39849 

AY278554 

SARS 

coronavirus 

CUHK Wl 

AY278741 

SARS 

coronavirus 

Urbani 

AY279354 

SARS 

coronavirus 

BJ04 

AY282752 

SARS 

coronavirus 

CUHK SulO 

AY283794 

SARS 

coronavirus 

Sin2500 

AY283795 

SARS 

coronavirus 

Sin2677 

AY283796 

SARS 

coronavirus 

Sin2679 

AY283797 

SARS 

coronavirus 

Sin2748 

AY283798 

SARS 

coronavirus 

Sin2774 

AY291315 

SARS 

coronavirus 

Frankfurt 1 

AY291451 

SARS 

coronavirus 

TW1 

AY297028 

SARS 

coronavirus 

ZJ01 

AY304486 

SARS 

coronavirus 

SZ3 

AY304488 

SARS 

coronavirus 

SZ16 

AY304495 

SARS 

coronavirus 

GZ50 

AY310120 

SARS 

coronavirus 

FRA 

AY313906 

SARS 

coronavirus 

GD69 

AY321118 

SARS 

coronavirus 

TWC 

AY323977 

SARS 

coronavirus 

HSR 

AY338174 

SARS 

coronavirus 

Taiwan TCI 

AY338175 

SARS 

coronavirus 

Taiwan TC2 

AY345986 

SARS 

coronavirus 

CUHK AG01 

AY345987 

SARS 

coronavirus 

CUHK AG02 

AY345988 

SARS 

coronavirus 

CUHK AGO3 

AY348314 

SARS 

coronavirus 

Taiwan TC3 

AY350750 

SARS 

coronavirus 

PUMC01 

AY357075 

SARS 

coronavirus 

PUMC02 

AY357076 

SARS 

coronavirus 

PUMC03 

AY390556 

SARS 

coronavirus 

GZ02 

AY394850 

SARS 

coronavirus 

WHU 

AY394978 

SARS 

coronavirus 

GZ B 

AY394979 

SARS 

coronavirus 

GZ C 

AY394981 

SARS 

coronavirus 

HGZ8L1 A 

AY394982 

SARS 

coronavirus 

HGZ8L1 B 

AY394983 

SARS 

coronavirus 

HSZ2 A 

AY394985 

SARS 

coronavirus 

HSZ Bb 

AY394986 

SARS 

coronavirus 

HSZ Cb 

AY394987 

SARS 

coronavirus 

HZS2 Fb 

AY394988 

SARS 

coronavirus 

JMD 

AY394989 

SARS 

coronavirus 

HZS2 D 

AY394990 

SARS 

coronavirus 

HZS2 E 

AY394991 

SARS 

coronavirus 

HZS2 Fc 

AY394992 

SARS 

coronavirus 

HZS2 C 

AY394993 

SARS 

coronavirus 

HGZ8L2 

AY394994 

SARS 

coronavirus 

HSZ Be 

AY394995 

SARS 

coronavirus 

HSZ Cc 

AY394996 

SARS 

coronavirus 

ZS B 

AY394997 

SARS 

coronavirus 

ZS A 

AY394998 

SARS 

coronavirus 

LC1 

AY394999 

SARS 

coronavirus 

LC2 

AY395000 

SARS 

coronavirus 

LC3 

AY395001 

SARS 

coronavirus 

LC4 

AY395002 

SARS 

coronavirus 

LC5 



D. Janies et al. / Cladistics 24 (2008) 111-130 


117 


Table 3 

( Continued) 


AY395003 

SARS coronavirus ZS C 

AY395004 

SARS coronavirus HZS2 Bb 

AY427439 

SARS coronavirus AS 

AY461660 

SARS coronavirus SoD 

AY463059 

SARS coronavirus ShanghaiQXCl 

AY485277 

SARS coronavirus Sinol 11 

AY485278 

SARS coronavirus Sino3 11 

AY502923 

SARS coronavirus TW10 

AY502924 

SARS coronavirus TW11 

AY502925 

SARS coronavirus TW2 

AY502926 

SARS coronavirus TW3 

AY502927 

SARS coronavirus TW4 

AY502928 

SARS coronavirus TW5 

AY502929 

SARS coronavirus TW6 

AY502930 

SARS coronavirus TW7 

AY502931 

SARS coronavirus TW8 

AY502932 

SARS coronavirus TW9 

AY508724 

SARS coronavirus NS 1 

AY515512 

SARS coronavirus HC SZ 61 03 

AY545914 

SARS coronavirus HC SZ 79 03 

AY545915 

SARS coronavirus HC SZ DM1 03 

AY545916 

SARS coronavirus HC SZ 266 03 

AY545917 

SARS coronavirus HC GZ 81 03 

AY545918 

SARS coronavirus HC GZ 32 03 

AY545919 

SARS coronavirus CFB SZ 94 03 

AY559082 

SARS coronavirus Sin852 

AY559084 

SARS coronavirus Sin3765V 

AY559085 

SARS coronavirus Sin848 

AY559086 

SARS coronavirus Sin849 

AY559093 

SARS coronavirus Sin845 

AY559095 

SARS coronavirus Sin847 

AY559096 

SARS coronavirus Sin850 

AY567487 

Human Coronavirus NL63 

AY568539 

SARS coronavirus GZ0401 

AY572034 

SARS coronavirus civet007 

AY572035 

SARS coronavirus civetOlO 

AY572038 

SARS coronavirus civet020 

AY613947 

SARS coronavirus GZ0402 

AY613948 

SARS coronavirus PC4 13 

AY613949 

SARS coronavirus PC4136 

AY613950 

SARS coronavirus PC4227 

AY654624 

SARS coronavirus TJF 

AY686863 

SARS coronavirus A022 

AY686864 

SARS coronavirus B039 

DQ022305 

Bat SARS coronavirus HKU3 1 

DQ071615 

Bat SARS coronavirus Rp3 

DQ084199 

Bat SARS coronavirus HKU3 2 

DQ084200 

Bat SARS coronavirus HKU3 3 

DQ412043 

Bat SARS coronavirus Rml 

DQ648857 

Bat coronavirus BtCoV 279 2005 

NC 004718 

SARS coronavirus Toronto 2 


with a genetical algorithm can shorten the computation 
time necessary to find satisfactory results (treated 
below). 

Initial tree build strategies under direct optimization 

We analyzed the 83 (Figs 1 and 4; Table 1) and 157 
(Figs 2 and 5; Table 2) isolate data sets with direct 
optimization into phylogenetic trees as implemented in 
POY3 on a 16 processor cluster of Linux PC based 


workstations running in parallel over a gigabit Ethernet 
switch. We used both parallel build and multibuild 
strategies (Janies and Wheeler, 2001). (POY3 parallel 
build commands: -parallel -replicates 9 
-fitchtrees -quick -staticapprox -notbr 
-maxtrees 10). (POY3 multibuild commands: 
parallel -multibuild -buildsperreplicate 
16 -approxbuild -nodiscrepancies -noran 
domizeoutgroup -sprmaxtrees 2 -tbrmaxtrees 
2 -fitchtrees -holdmaxtrees 2 -quick 
-staticapprox -replicates 2 -buildmax 

trees 2). 

Genetical algorithms under direct optimization 

Next, we used POY3 to perform tree fusion, a search 
heuristic first presented in a phylogenetic context by 
Goloboff (1999) to address the problem of composite 
optima. With a set of various near suboptimal trees such 
as produced during direct optimization analysis, often 
some taxa are in an optimal configuration in some of the 
trees but no one tree is optimal for all taxa. We applied 
the following POY3 commands to a concatenated file 
named “ALL. TREES” containing trees collected under 
various edit costs (POY3 commands: -parallel 
-fitchtrees -treefuse -fusemingroup 5-fuse 
maxtrees 10-fuselimit 100-slop 5-check 
slop 10-maxtrees 10-topofile ALL. TREES 
-molecularmatrix $ALIGNMENTPARAMETERS). 

Standard tree search for aligned data 

For the 114 isolate multiple alignment we ran a new 
technology search in TNT (Goloboff et al., 2003b) 
under equally weighted parsimony and stabilized the 
consensus 10 times (Fig. 6). We also ran these data 
under maximum likelihood under the GTR + GAM¬ 
MA and CAT models of nucleotide substitution for 
1000 randomly generated maximum parsimony trees in 
RAXML (Stamatakis, 2006) on a computing cluster. 

Character optimization on flat trees 

We optimized the position of the animal SARS-CoV 
isolates in the best tree(s) produced by tree fusion in 
each parameter set with the program MESQUITE 
(Maddison and Maddison, 2004) using the option: 
trace character history: parsimony ances¬ 
tral states. All best trees from the parameter study 
were used for study of the relative topological position 
of isolates in various hosts (Tables 4 and 5). 

For flat tree presentation of the optimization of: 
various 29-nucleotide fragments, key amino acid muta¬ 
tions, and host character states we used MESQUITE 
with trees for the 83 (Figs 1 and 4) and 157 isolate 
datasets (Figs 2 and 5, and supplemental data at http:// 
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Table 4 

Phylogenetic position of carnivore and swine relative to human SARS-CoV isolates in trees calculated under various edit costs under direct 
optimization for the 83 isolate data set 


Indel 

cost 

TV 

cost 

TS 

cost 

Tree 

length 

Position of SARS CoV isolated from carnivores 
and swine in tree 

1 

1 

1 

44737 

Terminal, nested within SARS CoV isolated from humans 

2 

2 

1 

71583 

Terminal, nested within SARS CoV isolated from humans 

2 

1 

1 

51209 

Terminal, nested within SARS CoV isolated from humans 

4 

2 

1 

82802 

Terminal, nested within SARS CoV isolated from humans 

8 

2 

1 

96851 

Terminal, nested within SARS CoV isolated from humans 


Table 5 

Phylogenetic position of carnivore and swine relative to human SARS-CoV isolates in trees calculated under various edit costs under direct 
optimization for the 157 isolate data set 


Indel 

cost 

TV 

cost 

TS 

cost 

Tree 

length 

Position of SARS CoV isolated from 
carnivores and swine in tree 

Position of SARS CoV isolated 
from Chiroptera in tree 

1 

1 

1 

60614 

Terminal, nested within SARS- 
CoV isolated from humans 

Basal to SARS-CoV isolated from 
humans, carnivores and swine 

2 

2 

1 

98057 

Terminal, nested within SARS- 
CoV isolated from humans 

Basal to SARS-CoV isolated from 
humans, carnivores and swine 

2 

1 

1 

74521 

Terminal, nested within SARS- 
CoV isolated from humans 

Basal to SARS-CoV isolated from 
humans, carnivores, and swine 

4 

2 

1 

123885 

Terminal, nested within SARS- 
CoV isolated from humans 

Basal to SARS-CoV isolated from 
humans, carnivores, and swine 

8 

2 

1 

154549 

Terminal, nested within SARS- 
CoV isolated from humans 

Most basal to SARS-CoV isolated from 
humans, carnivores, and swine. Two 
isolates from Chiroptera are terminal 


supramap.osu.edu/cov) produced by direct optimization 
under unitary edit costs (indels = 1, transversions = 1, 
transitions =1). 

For flat tree and geographic visualization studies 
(treated next) we used a binary version (using the TNT 
command randtree*) of the 114 isolate strict consen¬ 
sus tree produced by ClustalW alignment and parsi¬ 
mony search (Figs 3 and 6). 

Projection of a tree, key mutations and metadata into a 
virtual globe 

We used the methods described in Janies et al. (2007) to 
project a binary representation of the tree found for 114 
isolates in TNT into a virtual globe (http://supramap. 
osu.edu/cov/janiesetal2008covsars.kmz). One subtle dif¬ 
ference was that in this case we used an apomorphy list 
derived from PAUP* (version 4.0b 10; Swofford, 2002) 
using the command describe trees: output list 
of apomorphies. We drew data on host and date of 
isolation from Lau et al. (2005; GenBank, or the 
International Committee on Taxonomy of Viruses data¬ 
base (http://www.ncbi.nlm.nih.gov/ICTVdb). 

Spike protein mutations 

Not all nucleotide records for coronaviruses in 
GenBank had translations to proteins. To get amino 


acid data of interest we translated nucleotide records 
into proteins in the Genetic Data Environment (http:// 
www-bimas.cit.nih.gov/gde_sw.html) and checked these 
translations against reference amino acid sequences 
from GenBank. Amino acid sequences were aligned 
with ClustalW. Amino acid positions 479 and 487 of the 
spike protein were optimized on a tree using apomorphy 
commands of PAUP for tree projections. Optimizations 
of these amino acid positions were also conducted in 
MESQUITE for flat tree visualization (supplemental 
data at http://supramap.osu.edu/cov). 

Genotype-phenotype correlation studies 

We used the options: trace and chart of MACC- 
LADE (Maddison and Maddison, 2000) to perform the 
concentrated changes test (Maddison, 1990) with the 
presence of the region CCTACTGGTTACCAAC- 
CTGAATGGAATAT as the independent character 
and the infection of carnivores as the dependent charac¬ 
ter. Any ambiguities in the optimization were resolved 
using the DELTRAN option. The CCT test was per¬ 
formed using simulation sample size of 100 000 iterations. 

Sensitivity analysis of outgroup choice 

Rooting an evolutionary tree is a critical step to 
polarize the temporal sequence of genomic and 
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SARS-CoV TOR2 
SARS-CoV HSR1 
SARS-CoV TJF 
SARS-CoV GD69 
SARS-CoV HKU 39849 
SARS-CoV HKU 66078 
SARS-CoV HKU 65806 
SARS-CoV TWC 
SARS-CoV Frankfurtl 
SARS-CoV TW1 
SARS-CoV Urbani 
SARS-CoV ZJ01 
SARS-CoV SIN 2774 
SARS-CoV SIN 2748 
SARS-CoV SIN 2677 
SARS-CoV SIN 2500 
SARS-CoV SIN 2679 
SARS-CoV SZ13 racoon dog 
SARS-CoV SZ16 civet cat 
SARS-CoV SZ1 civet cat 
SARS-CoV SZ3 civet cat 
SARS-CoV GD03T0013 
SARS-CoV HC SZ 61 03 civet cat 
SARS-CoV BJ03 
SARS-CoV BJ02 
SARS-CoV BJ04 
SARS-CoV HZS2 Bb 
SARS-CoV GZ50 
SARS-CoV HKU 36871 
SARS-CoV BJ01 
SARS-CoV GZ60 
SARS-CoV GZ43 
SARS-CoV GZ02 
SARS-CoV GD01 
SARS-CoV ZS A 
SARS-CoV ZS B 
SARS-CoV HSZ Cc 
SARS-CoV HSZ Be 
SARS-CoV CUHK W1 
SARS-CoV LC5 
SARS-CoV LC4 
SARS-CoV LC3 
SARS-CoV LC2 
SARS-CoV CUHK AG02 
SARS-CoV CUHK AG01 
SARS-CoV CUHK Sul0 
SARS-CoV HZS2 Fc 
SARS-CoV HGZ8L2 
SARS-CoV HZS2 C 
SARS-CoV HZS2 E 
SARS-CoV HZS2 D 
SARS-CoV HZS2 Fb 
SARS-CoV GZ D 
SARS-CoV HSZ Cb 
SARS-CoV HSZ A 
SARS-CoV ZS C 
SARS-CoV HGZ8L1 A 
SARS-CoV HGZ8L1 B 
SARS-CoV HSZ2 A 
SARS-CoV GZ C 
SARS-CoV GZ B 
SARS-CoV HSZ Bb 
Bovine coronavirus 
Bovine coronavirus LUN 
Bovine coronavirus Mebus 
Bovine coronavirus Quebec 
Porcine hemagglutin encep virus 
Human coronavirus OC43 NL 
Human coronavirus OC43 
Murine hepatitis virus ML11 
Murine hepatitis virus 2 
Murine hepatitis virus Penn 971 
Murine hepatitis virus MHVA59 
Murine hepatitis virus ML10 
Rat sialodacryoadenitis coronavirus 
Human coronavirus NL63 
Human coronavirus 229E 
Porcine epidemic diarrhea virus 
Transmissible gastroenteritis virus 
Canine coronavirus 
Feline infectious peritonitis virus 
Avian infectious bronchitis virus 
Turkey coronavirus 


Fig. 1. Phylogenetic tree produced by direct optimization of 83 coronavirus isolates based on whole and partial genomes (sampling in Table 1). 
Branches with black traces indicate presence of the 29-nucleotide region, CCTACTGGTTACCAACCTGAATGGAATAT (e.g., positions 27869- 
27897 in AY278489) in an uncharacterized protein of variants of the SARS-CoV that infect small carnivores and humans. White traces indicate the 
absence of this region. In this analysis, the evolution of insertions and deletions of this region is labile and complex. 
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29-nucleotide region 

I I absent 

■ CCTACTGGTTACCAACCTGAATGGAATAT 
[ CCAATACATTACTATTCGGACTGGTTTAT 









SARS-CoV ZSB 
SARS-CoV ZS A 
SARS-CoV ZS C 
SARS-CoV JMD 
SARS-CoV HGZ8L1 B 
SARS-CoV GZ C 
SARS-CoV GZ B 
SARS-CoV Sin852 
SARS-CoV Sin849 
SARS-CoV Sin2677 
SARS-CoV Sin2500 
SARS-CoV WHU 
SARS-CoV TWC 
SARS-CoV Sin2748 
SARS-CoV SoD 
SARS-CoV Frankfurt 1 
SARS-CoV Sin2774 
SARS-CoV Sin848 
SARS-CoV Sin847 
SARS-CoV Sin845 
SARS-CoV Sin850 
SARS-CoV Sin2679 
SARS-CoV Taiwan TC3 
SARS-CoV Taiwan TC2 
SARS-CoV Taiwan TCI 
SARS-CoV BJ03 
SARS-CoV BJ02 
SARS-CoV BJ04 
SARS-CoV HZS2 Bb 
SARS-CoV ShanghaiQXCI 
SARS-CoV ZJ01 
SARS-CoV Urbani 
SARS-CoV AS 
SARS-CoV TWY 
SARS-CoV TWS 
SARS-CoV TWK 
SARS-CoV TWJ 
SARS-CoV TW9 
SARS-CoV TW11 
SARS-CoV TW10 
SARS-CoV TW6 
SARS-CoV TWH 
SARS-CoV TW7 
SARS-CoV TW8 
SARS-CoV TW5 
SARS-CoV TW4 
SARS-CoV TW3 
SARS-CoV TW2 
SARS-CoV TW1 
SARS-CoV GZ60 
SARS-CoV GZ43 
SARS-CoV HKU 36871 
SARS-CoV GZ A 
SARS-CoV GZ50 
SARS-CoV HKU 66078 
SARS-CoV HKU 65806 
SARS-CoV Sin3765V 
SARS-CoV FRA 
SARS-CoV HKU 39849 
SARS-CoV Sino3 11 
SARS-CoV GD69 
SARS-CoV Sinol 11 
SARS-CoV BJ01 
SARS-CoV TJF 
SARS-CoV NS 1 
SARS-CoV HZS2 Fc 
SARS-CoV HZS2 Fb 
SARS-CoV PUMC03 
SARS-CoV PUMC02 
SARS-CoV PUMC01 
SARS-CoV CUHK Sul 0 
SARS-CoV CUHKAG02 
SARS-CoV CUHK AG01 
SARS-CoV CUHK AG03 
SARS-CoV LC1 
SARS-CoV GZ D 
SARS-CoV LC5 
SARS-CoV LC3 
SARS-CoV LC4 
SARS-CoV LC2 
SARS-CoV Toronto 2 
SARS-CoV HSR 
SARS-CoV HZS2 E 
SARS-CoV HGZ8L2 
SARS-CoV HZS2 C 
SARS-CoV HSZ2 A 
SARS-CoV HZS2 D 
SARS-CoV CUHK W1 
SARS-CoV HSZ Cc 
SARS-CoV HSZ Cb 
SARS-CoV HSZ Bb 
SARS-CoV HSZ A 
SARS-CoV HSZ Be 
SARS-CoV HGZ8L1 A 
SARS-CoV PC4 205 
SARS-CoV PC4 136 
SARS-CoV GZ0403 
SARS-CoV PC4 199 
SARS-CoV PC4 13 
SARS-CoV GZ0401 
SARS-CoV GD03T0013 
SARS-CoV PC4 115 
SARS-CoV GZ0402 
SARS-CoV PC4 241 
SARS-CoV PC4 145 
SARS-CoV PC4 227 
SARS-CoV HCGZ 81 03 
SARS-CoV PC4 137 
SARS-CoV PC4 127 
SARS-CoV HC GZ 32 03 
SARS-CoV CFB SZ 94 03 
SARS-CoV civet020 
SARS-CoV HC SZ 266 03 
SARS-CoV HCSZ DM1 03 
SARS-CoV HC SZ 79 03 
SARS-CoV civet007 
SARS-CoV A022 
SARS-CoV ci veto 10 
SARS-CoV B039 
SARS-CoV HC SZ 61 03 
SARS-CoV SZ16 
SARS-CoV SZ13 
SARS-CoV SZ1 
SARS-CoV SZ3 
SARS-CoV GZ02 
SARS-CoV GD01 
Bat SARS-CoV Rfl 
Bat SARS-CoV Rp2 
Bat SARS-CoV Rpl 
Bat SARS-CoV Rp3 
Bat SARS-CoV HKU3 3 
Bat SARS-CoV HKU3 1 
Bat SARS-CoV HKU3 2 
Bat CoV BtCoV 279 2005 
Bat SARS-CoV Rml 
Bovine CoV Quebec 
Bovine CoV Mebus 
Bovine CoV LUN 
Bovine CoV 

Porcine hemaqqlutinatinq encep 
Human CoV strain OC43 
Human CoV OC43 
Murine hepatitis ML 11 
Murine hepatitis 2 
Murine hepatitis Penn 97 1 
Rat sialodacryoadenitis CoV 
Murine hepatitis ML 10 
Murine hepatitis virus 
Transmissible gastroenteritis 
Feline infectious peritonitis 
Canine CoV 
Human CoV 229E 
Human CoV NL63 
Porcine epidemic diarrhea virus 
Bat CoV strain 61 
Turkey CoV 

Avian infectious bronchitis 


Fig. 2. Phylogenetic tree produced by direct optimization of whole and partial coronavirus genomes produced of 157 isolates (sampling in Table 2). 
Branches with black traces indicate presence of the 29-nucleotide region, CCTACTGGTTACCAACCTGAATGGAATAT (e.g., positions 27869- 
27897in AY278489) in an uncharacterized protein of variants of the SARS-CoV that infect small carnivores and humans. Branches with green traces 
indicate the presence of the 29-nucleotide region CCAATACATTACTATTCGGACTGGTTTAT (e.g., positions 27866-27894 in DQ648857) in an 
uncharacterized protein of all SARS-CoV isolated from Chiroptera. White traces indicate the absence of either region. In this analysis, the evolution 
of insertions and deletions of these regions is labile and complex. 
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29-nucleotide region 


absent 

CCTACTGGTTACCAACCTGAATGGAATAT 

CCAATACATTACTATTCGGACTGGTTTAT 




SARS CoV HC SZ 79 03 
SARS CoV A022 
SARS CoV civet007 
SARS CoV civetOlO 
SARS CoV B039 
SARS CoV GZ0402 
SARS CoV GZ0401 
SARS CoV HC GZ 32 03 
SARS CoV civet020 

► SARS CoV HC SZ 266 03 
SARS CoV PC4 136 
SARS CoV PC4 13 
SARS CoV HC SZ 61 03 
SARS CoVCFBSZ 94 03 
SARS CoV HC SZ DM1 03 
SARS CoV PC4 227 
SARS CoV HC GZ 81 03 
SARS CoV SZ3 

SARS CoV SZ16 
SARS CoV GZ02 
SARS CoVGDOl 
SARS CoV HGZ8L1 A 
SARS CoV HSZ Cc 
SARS CoV HSZ Cb 
SARS CoV HSZ Be 
SARS CoV HSZ Bb 
2 SARS CoVCUHKWI 
SARS CoV ZS C 
SARS CoV ZS A 

2 SARS CoV ZS B 

3 SARS CoVJMD 

2 SARS CoV HGZ8L1 B 
SARS CoV HZS2 D 
SARS CoV HGZ8L2 
2 SARS CoV HSZ2 A 
2 SARS CoV HZS2 E 
SARS CoV HZS2 Bb 
SARS CoV GZ50 
2 SARS CoV HZS2 C 
SARS CoV Sin848 
SARS CoV Sin847 

2 SARS CoV Sin845 
SARS CoV Sin852 
SARS CoV Sin3765V 

> SARS CoV Sin850 

3 SARS CoV Sin2679 
SARS CoV HZS2 Fc 
SARS CoV HZS2 Fb 
SARS CoV Sin2500 
SARS CoV ShanghaiQXCI 

2 SARS CoV HSR 
SARS CoV BJ03 
SARS CoV BJ02 
2 SARS CoV BJ01 
SARS CoVTW11 
SARS CoV Taiwan TC3 
5 SARS CoVTWS 
2 SARS CoVTWJ 
2 SARS CoV Taiwan TC2 
2 SARS CoV TW9 
2 SARS CoVTWK 
SARS CoV TWY 
SARS CoVTWIO 
SARS CoVTWH 
SARS CoV TW7 
SARS CoVTaiwan TCI 
SARS CoV CUHK AG02 
SARS CoV TW8 
SARS C0VTW6 
SARS CoV GD69 
SARS CoV CUHK AG03 
2 SARS CoV CUHK AG01 
SARS CoV Sino3 11 
SARS CoV PUMC02 
SARS CoV Sinol 11 
SARS CoV PUMC03 
2 SARS CoV PUMC01 
2 SARS CoV LC1 
2 SARS CoV CUHK Sul 0 
2SARSCoV TW5 
SARS CoV WHU 
SARS CoV TWC 
2 SARS CoV HKU 39849 
SARS CoV Frankfurt 1 
SARS CoV FRA 
2 SARS CoV SoD 
2 SARS CoV Sin2774 
2 SARS CoVZJOl 
SARS CoV TW4 
SARS CoV TW3 
2 SARS CoVTWI 
2 SARS CoV TW2 
SARS CoV NS 1 
SARS CoV BJ04 
2 SARS CoV Toronto 2 
SARS CoV Urbani 
SARS CoVTJF 
2 SARS CoV AS 
2 SARS CoV Sin2748 
2 SARS CoV Sin2677 
SARS CoV GZ C 
SARS CoV GZ B 
2 SARS CoV Sin849 
SARS CoV LC5 
SARS CoV LC4 
2 SARS CoV LC3 
2 SARS CoV LC2 
Bat SARS CoV HKU3 3 
Bat SARS CoV HKU3 1 
Bat SARS CoV HKU3 2 
Bat SARS CoV Rml 
Bat CoV BtCoV 279 2005 
Bat SARS CoV Rp3 
2 Human CoVNL63 


Fig. 3. Binary representation of strict consensus tree produced by multiple alignment followed by tree search under parsimony of 114 whole 
coronavirus genomes. Branches with black traces indicate presence of the 29-nucleotide region, CCTACTGGTTACCAACCTGAATGGAATAT 
(e.g., positions 27869-27897 in AY278489) in an uncharacterized protein of variants of the SARS-CoV that infect small carnivores and humans. 
Branches with green traces indicate the presence of the 29-nucleotide region CCAATACATTACTATTCGGACTGGTTTAT (e.g., positions 27866- 
27894 in DQ648857) in an uncharacterized protein of all SARS-CoV isolated from Chiroptera. White traces indicate the absence of either region. In 
this analysis the evolution of insertions and deletions of these regions is simple. 


phenotypic changes and clarify the relationships of the 
organisms. Unlike Snijder et al. (2003) who used an 
equine torovirus outgroup (as the taxonomy suggests 
might be suitable http://www.ncbi.nlm.nih.gov/ICT- 
Vdb/Ictv/index.htm), we could not verify the suitabil¬ 
ity of an outgroup from outside the coronaviruses. 
Our investigation using BLAST (Altschul et al., 1997) 


[default values as implemented in GenBank http:// 
www.ncbi.nlm.nih.gov (i.e., expect = 10)] indicated to 
us that no arterivirus or torovirus genome in Gen- 
Bank bears significant nucleotide similarity with any 
coronavirus. As outgroups, we used genomes 
and partial genomes from non-SARS coronaviruses 
(Tables 1, 2 and 3). We choose many candidate 
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Host 

M Carnivora 
H Human 
WM Aves 
H Artylodactlya 
□□ Rodentia 



[ 




SARS-CoVTOR2 
SARS-CoV HSR1 
SARS-CoVTJF 
SARS-CoV GD69 
SARS-CoV HKU 39849 
SARS-CoV HKU 66078 
SARS-CoV HKU 65806 
SARS-CoV TWC 
SARS-CoV Frankfurtl 
SARS-CoV TW1 
SARS-CoV Urbani 
SARS-CoV ZJ01 
SARS-CoV SIN 2774 
SARS-CoV SIN 2748 
SARS-CoV SIN 2677 
SARS-CoV SIN 2500 
SARS-CoV SIN 2679 



A 




» 



£ 

I 



m 



m 


SARS-CoV SZ13 racoon dog 
SARS-CoV SZ16 civet cat 
SARS-CoV SZ1 civet cat 
SARS-CoV SZ3 civet cat 
SARS-CoV GD03T0013 
SARS-CoV HC SZ 61 03 civet cat 
SARS-CoV BJ03 
SARS-CoV BJ02 
SARS-CoV BJ04 
SARS-CoV HZS2 Bb 
SARS-CoV GZ50 
SARS-CoV HKU 36871 
SARS-CoV BJ01 
SARS-CoV GZ60 
SARS-CoV GZ43 
SARS-CoV GZ02 
SARS-CoV GD01 
SARS-CoV ZS A 
SARS-CoV ZS B 
SARS-CoV HSZ Cc 
SARS-CoV HSZ Be 
SARS-CoV CUHKW1 
SARS-CoV LC5 
SARS-CoV LC4 
SARS-CoV LC3 
SARS-CoV LC2 
SARS-CoV CUHKAG02 
SARS-CoV CUHKAG01 
SARS-CoV CUHKSu 10 
SARS-CoV HZS2 Fc 
SARS-CoV HGZ8L2 
SARS-CoV HZS2 C 
SARS-CoV HZS2 E 
SARS-CoV HZS2 D 
SARS-CoV HZS2 Fb 
SARS-CoV GZ D 
SARS-CoV HSZ Cb 
SARS-CoV HSZ A 
SARS-CoV ZS C 
SARS-CoV HGZ8L1 A 
SARS-CoV HGZ8L1 B 
SARS-CoV HSZ2 A 
SARS-CoV GZ C 
SARS-CoV GZ B 
SARS-CoV HSZ Bb 



Bovine coronavirus 
Bovine coronavirus LUN 
Bovine coronavirus Mebus 
Bovine coronavirus Quebec 
Porcine hemagglutin encep virus 
Human coronavirus OC43 NL 
Human coronavirus OC43 
Murine hepatitis virus ML11 
Murine hepatitis virus 2 
Murine hepatitis virus Penn 971 
Murine hepatitis virus MHVA59 
Murine hepatitis virus ML10 
Rat sialodacryoadenitis coronavirus 
Human coronavirus NL63 
Human coronavirus 229E 
Porcine epidemic diarrhea virus 
Transmissible gastroenteritis virus 
Canine coronavirus 
Feline infectious peritonitis virus 
Avian infectious bronchitis virus 
Turkey coronavirus 


Fig. 4. Phylogenetic tree produced by direct optimization of 83 coronavirus isolates based on whole and partial genomes (sampling in Table 1). The 
evolution of hosts is optimized on the genome-based tree as shown by the colors traced on the branches. Note that the SARS-CoV isolates from 
carnivores (purple trace: civet cat Parguma larvata, raccoon dog Nyctereutes procyonoides, and ferret badger Melogale moschata ) and artiodactyls 
(light blue trace: pig, Sus scrofa ) are nested within a large clade of SARS-CoV isolates from humans (yellow trace: Homo sapiens ), which are basal 
among SARS-CoV. The search method for the genomic data was direct optimization. Parsimony optimization was used for the host data. The edit 
costs were indels 1, transversions 1, transitions 1. 


outgroup taxa to maximize host and antigenic diver¬ 
sity. Clades formed by antigenic group 1, group 2, 
and group 3 coronaviruses have significant branch 
lengths between each other and the SARS-CoV clade. 
Finding the ingroup root when the available out¬ 
groups are markedly divergent can be challenging. The 
divergence can be a result of rapid mutation rates, 
recombination events, inadequate sampling, multiple 
evolutionary origins, or a combination of these 
phenomena. Thus we performed several experimental 
searches in which a random outgroup selected from 
non-SARS taxa was used. The results of these 
searches were assessed to see whether our phylogenetic 
and host evolution results were affected by outgroup 


choice. To perform these randomization experiments, 
we output an implied alignment (Wheeler, 2003) 
resulting from each parameter set and best tree. 
(POY3 commands: -phastwincladfile $IM- 
PLIEDALIGNMENT.phast -topodiagnoseonly - 
topofile $ AL IGNME N T P ARAME T E R S . TREE). Next, 
for each implied alignment we used 1000 replicate new 
technology tree searches (TNT command: XMULT 2) 
(Goloboff et al., 2003b). In each search replicate, we 
randomly deleted a subset of the outgroup taxa and 
assessed: (1) whether the most basal taxon in the 
SARS ingroup was stable, and (2) whether the most 
basal taxon of the SARS ingroup was ever an isolate 
from an animal host (scripts available from the authors). 
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Host 

H Chiroptera 
na Carnivora 

□ Human 
H Aves 

□ Artylodactlya 

□ Rodentia 
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SARS-CoV ZS B 
SARS-CoV ZS A 
SARS-CoV ZS C 
SARS-CoV JMD 
SARS-CoV HGZ8L1 B 
SARS-CoV GZ C 
SARS-CoV GZ B 
SARS-CoV Sin852 
SARS-CoV Sin849 
SARS-CoV Sin2677 
SARS-CoV Sin2500 
SARS-CoV WHU 
SARS-CoV TWC 
SARS-CoV Sin2748 
SARS-CoV SoD 
SARS-CoV Frankfurt 1 
SARS-CoV Sin2774 
SARS-CoV Sin848 
SARS-CoV Sin847 
SARS-CoV Sin845 
SARS-CoV Sin850 
SARS-CoV Sin2679 
SARS-CoV Taiwan TC3 
SARS-CoV Taiwan TC2 
SARS-CoV Taiwan TCI 
SARS-CoV BJ03 
SARS-CoV BJ02 
SARS-CoV BJ04 
SARS-CoV HZS2 Bb 
SARS-CoV ShanghaiQXCI 
SARS-CoV ZJ01 
SARS-CoV Urbani 
SARS-CoV AS 
SARS-CoV TWY 
SARS-CoV TWS 
SARS-CoV TWK 
SARS-CoV TWJ 
SARS-CoV TW9 
SARS-CoV TW11 
SARS-CoV TW10 
SARS-CoV TW6 
SARS-CoV TWH 
SARS-CoV TW7 
SARS-CoV TW8 
SARS-CoV TW5 
SARS-CoV TW4 
SARS-CoV TW3 
SARS-CoV TW2 
SARS-CoV TW1 
SARS-CoV GZ60 
SARS-CoV GZ43 
SARS-CoV HKU 36871 
SARS-CoV GZ A 
SARS-CoV GZ50 
SARS-CoV HKU 66078 
SARS-CoV HKU 65806 
SARS-CoV Sin3765V 
SARS-CoV FRA 
SARS-CoV HKU 39849 
SARS-CoV Sino3 11 
SARS-CoV GD69 
SARS-CoV Sinol 11 
SARS-CoV BJ01 
SARS-CoV TJF 
SARS-CoV NS 1 
SARS-CoV HZS2 Fc 
SARS-CoV HZS2 Fb 
SARS-CoV PUMC03 
SARS-CoV PUMC02 
SARS-CoV PUMC01 
SARS-CoV CUHK Sul 0 
SARS-CoV CUHKAG02 
SARS-CoV CUHK AG01 
SARS-CoV CUHK AG03 
SARS-CoV LC1 
SARS-CoV GZ D 
SARS-CoV LC5 
SARS-CoV LC3 
SARS-CoV LC4 
SARS-CoV LC2 
SARS-CoV Toronto 2 
SARS-CoV HSR 
SARS-CoV HZS2 E 
SARS-CoV HGZ8L2 
SARS-CoV HZS2 C 
SARS-CoV HSZ2 A 
SARS-CoV HZS2 D 
SARS-CoV CUHK W1 
SARS-CoV HSZ Cc 
SARS-CoV HSZCb 
SARS-CoV HSZ Bb 
SARS-CoV HSZ A 
SARS-CoV HSZ Be 
SARS-CoV HGZ8L1 A 
SARS-CoV PC4 205 
SARS-CoV PC4 136 
SARS-CoV GZ0403 
SARS-CoV PC4 199 
SARS-CoV PC4 13 
SARS-CoV GZ0401 
SARS-CoV GD03T0013 
SARS-CoV PC4 115 
SARS-CoV GZ0402 
SARS-CoV PC4 241 
SARS-CoV PC4 145 
SARS-CoV PC4 227 
SARS-CoV HCGZ 81 03 
SARS-CoV PC4 137 
SARS-CoV PC4 127 
SARS-CoV HC GZ 32 03 
SARS-CoV CFB SZ 94 03 
SARS-CoV civet020 
SARS-CoV HC SZ 266 03 
SARS-CoV HC SZ DM1 03 
SARS-CoV HC SZ 79 03 
SARS-CoV civet007 
SARS-CoV A022 
SARS-CoV civetOlO 
SARS-CoV B039 
SARS-CoV HC SZ 61 03 
SARS-CoV SZ16 
SARS-CoV SZ13 
SARS-CoV SZ1 
SARS-CoV SZ3 
SARS-CoV GZ02 
SARS-CoV GD01 
Bat SARS-CoV Rfl 
Bat SARS-CoV Rp2 
Bat SARS-CoV Rpl 
Bat SARS-CoV Rp3 
Bat SARS-CoV HKU3 3 
Bat SARS-CoV HKU3 1 
Bat SARS-CoV HKU3 2 
Bat CoV BtCoV 279 2005 
Bat SARS-CoV Rml 
Bovine CoV Quebec 
Bovine CoV Mebus 
Bovine CoV LUN 
Bovine CoV 

Porcine hemagglutinating encep 
Human CoV strain OC43 
Human CoV OC43 
Murine hepatitis ML 11 
Murine hepatitis 2 
Murine hepatitis Penn 97 1 
Rat sialodacryoadenitis CoV 
Murine hepatitis ML 10 
Murine hepatitis virus 
Transmissible gastroenteritis 
Feline infectious peritonitis 
Canine CoV 
Human CoV 229E 
Human CoV NL63 
Porcine epidemic diarrhea virus 
Bat CoV strain 61 
Turkey CoV 

Avian infectious bronchitis 


Fig. 5. Phylogenetic tree produced by direct optimization of whole 
and partial coronavirus genomes produced of 157 isolates (sampling in 
Table 2). Note that the SARS-CoV isolates from Chiroptera (black 
trace: Rhinolophus sinicus, Rhinolophus ferrumequinum , Rhinolophus 
macrotis and Rhinolophus pearsoni ) are basal among the entire SARS- 
CoV clade. SARS-CoV isolates from small carnivores (purple trace) 
and artiodactyls (light blue trace) are nested within a clade of SARS- 
CoV isolates from humans (yellow trace), although there were several 
exchanges between humans and carnivores. The search method for the 
genomic data was direct optimization. Parsimony optimization was 
used for the host data. The edit costs were indels 1, transversions 1, 
transitions 1. 


Resampling 

We performed jackknife GC resampling in TNT 
(Goloboff et al., 2003a,b) on the ClustalW alignment of 
the 114 isolate data set and the implied alignment from 
unitary costs for the 83 and 157 isolate data sets as 
specified by the following commands: resample jak 
replOOO [ xm = lev5 rep5] from 0. 

We performed 1000 bootstrap resampling replicates in 
RAXML (Stamatakis, 2006) with the following com¬ 
mands: -f d -m GTRCAT - 1000 -b 12345 -n Mul 
tipleBootstrap. 


Results 

Direct optimization searches 

Best tree lengths for the direct optimization searches 
under various parameters are reported for the 83 isolate 
data set in Table 4 and for the 157 isolate data set in 
Table 5. The resampling values are reported as supple¬ 
mental data at http://supramap.osu.edu/cov/. 

Multiple alignment to standard tree search 

For the 114 isolate data set, a best score of 22 363 steps 
under equally weighted parsimony was hit 107 times and 
87 trees were retained. A strict consensus of 59 nodes was 
stabilized 10 times (Fig. 6). The best RAXML tree for this 
alignment was found under GTRGAMMA at -In likeli¬ 
hood of 111006.264984. RAXML trees with host char¬ 
acter optimization and resampling values are available in 
supplemental data at http://supramap.osu.edu/cov/. 

Evolution of host shifts among coronaviruses 

In the 83 isolate data set in all parameter sets 
considered, we found the SARS-CoV isolates from 
P. larvata , N. procyonoides (Carnivora) and Sus scrofa 
(Artiodactyla) to occur in terminal positions of the trees, 
nested well within a large clade of SARS-CoV isolated 
from humans (Fig. 4, Table 4). Thus, based on genomic 
evidence, SARS-CoV occurred in P. larvata , N. procyo- 
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Host of Cov 


Artiodactyla 

Carnivora 


Chiroptera 

Human 



SARS-CoV HC SZ 79 03 
SARS-CoV A022 
SARS-CoV civet007 
SARS-CoV civetOlO 
SARS-CoV B039 
SARS-CoV GZ0402 
SARS-CoV GZ0401 
SARS-CoV HC GZ 32 03 
SARS-CoV civet020 
SARS-CoV HC SZ 266 03 
SARS-CoV PC4 136 
SARS-CoV PC4 13 
SARS-CoV HCSZ 61 03 
SARS-CoV CFB SZ 94 03 
SARS-CoV HCSZ DM1 03 
SARS-CoV PC4 227 
SARS-CoV HCGZ 81 03 
SARS-CoV SZ3 
SARS-CoV SZ16 
SARS-CoV GZ02 
SARS-CoV GD01 
» SARS-CoV HGZ8L1 A 
SARS-CoV HSZ Cc 
SARS-CoV HSZ Cb 
SARS-CoV HSZ Be 
SARS-CoV HSZ Bb 
SARS-CoV CUHKW1 
SARS-CoV ZS C 
SARS-CoV ZS A 
SARS-CoV ZS B 
SARS-CoV JMD 
SARS-CoV HGZ8L1 B 
SARS-CoV HZS2 D 
SARS-CoV HGZ8L2 
SARS-CoV HSZ2 A 
SARS-CoV HZS2 E 
SARS-CoV HZS2 Bb 
SARS-CoV GZ50 
SARS-CoV HZS2 C 
SARS-CoV Sin848 
SARS-CoV Sin847 
SARS-CoV Sin845 
SARS-CoV Sin852 
SARS-CoV Sin3765V 
SARS-CoV Sin850 
i SARS-CoV Sin2679 
SARS-CoV HZS2 Fc 
SARS-CoV HZS2 Fb 
SARS-CoV Sin2500 
SARS-CoV ShanghaiQXCI 
i SARS-CoV HSR 
SARS-CoV BJ03 
SARS-CoV BJ02 
i SARS-CoV BJ01 
SARS-CoV TW11 
SARS-CoV Taiwan TC3 
t SARS-CoV TWS 
SARS-CoV TWJ 
SARS-CoV Taiwan TC2 
SARS-CoV TW9 

• SARS-CoV TWK 
SARS-CoV TWY 
SARS-CoV TW10 
SARS-CoV TWH 
SARS-CoV TW7 
SARS-CoV Taiwan TCI 
SARS-CoV CUHKAG02 
SARS-CoV TW8 
SARS-CoV TW6 
SARS-CoV GD69 
SARS-CoV CUHKAG03 

» SARS-CoV CUHKAG01 
SARS-CoV Sino3 11 
SARS-CoV PUMC02 
SARS-CoV Sinol 11 
SARS-CoV PUMC03 

• SARS-CoV PUMC01 
SARS-CoV LC1 
SARS-CoV CUHK Sul 0 
SARS-CoV TW5 
SARS-CoV WHU 
SARS-CoV TWC 

» SARS-CoV HKU 39849 
SARS-CoV Frankfurt 1 
SARS-CoV FRA 
SARS-CoV SoD 
SARS-CoV Sin2774 
SARS-CoV ZJ01 
SARS-CoV TW4 
SARS-CoV TW3 
SARS-CoV TW1 
SARS-CoV TW2 
SARS-CoV NS 1 
SARS-CoV BJ04 
SARS-CoV Toronto 2 
SARS-CoV Urbani 
SARS-CoV TJF 

• SARS-CoV AS 
SARS-CoV Sin2748 
SARS-CoV Sin2677 
SARS-CoV GZ C 
SARS-CoV GZ B 
SARS-CoV Sin849 
SARS-CoV LC5 
SARS-CoV LC4 
SARS-CoV LC3 
SARS-CoV LC2 
Bat SARS-CoV HKU3 3 
Bat SARS-CoV HKU3 1 
Bat SARS-CoV HKU3 2 
Bat SARS-CoV Rml 
Bat CoV BtCoV 279 2005 
Bat SARS-CoV Rp3 
Human CoV NL63 


Fig. 6. Note that the SARS-CoV isolates from Chiroptera (black trace) are basal to the entire SARS-CoV clade. The SARS-CoV isolates from 
carnivores (purple trace) and artiodactyls (light blue trace) are nested within a large clade of SARS-CoV isolates from humans (yellow trace), 
although there were exchanges of SARS-CoV between humans and carnivores. The tree search and character optimization were conducted under 
equally weighted parsimony. 


noides and S. scrofa after SARS-CoV occurred in 
humans (Figs. 4). The shift of SARS-CoV from human 
hosts to S. scrofa host is independent of the shift from 
human host to small carnivore hosts (A. procyonoides 
and S. scrofa). 

In the 83 isolate tree recovered under unitary costs, the 
polarity of host shift is ambiguous between the SARS- 


CoV isolate from N. procyonoides (HC/SZ/61/03) and 
the SARS-CoV isolate GD03T0013 from humans. 
GD03T0013 is closely related to SARS-CoV isolated 
from civets served in a restaurant in Guangzhou, China 
in late 2003 and early 2004. No epidemiological data link 
the GD03T0013 human case to exposure to laboratory 
isolates of SARS-CoV (Wang et al., 2005). 
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In the 157 isolate data set, under all parameters we 
found the SARS-CoV isolates from P. larvata , 
N. procyonoides and S. scrofa were terminal, nested well 
within a large clade of SARS-CoV isolated from humans 
(Fig. 5, Table 5). In the analysis of these data under most 
parameter sets the SARS-CoV isolated from Chiroptera 
were basal to SARS-CoV isolated from humans, carni¬ 
vores and swine. A solitary minor exception to this 
pattern occurred under an extremely biased edit cost 
model of indels 8, transversions 2, transitions 1 (Table 5). 
In this analysis, two of four isolates of SARS-CoV from 
Chiroptera occur in terminal rather than basal positions. 

In the 157 isolate tree recovered under unitary costs, 
the human SARS-CoV isolate GD03T0013 is closely 
related to civet as well as human isolates SARS-CoV. 
This is consistent with the result that there were 
bidirectional exchanges of SARS-CoV between humans 
and carnivores. 

The 114 isolate trees that result from analyses using 
multiple alignment and standard tree searches under 
parsimony and maximum likelihood show a pattern of 
host shifts similar to those described for the direct 
optimization searches. SARS-CoV isolated from Chi¬ 
roptera are basal to SARS-CoV under alignment plus 
parsimony search or alignment plus maximum likeli¬ 
hood search. In all results from the 114 isolate data set 
SARS-CoV isolated from carnivores are terminal and 
nested within a large clade of SARS-CoV isolated from 
humans and there is evidence of bidirectional exchange 
of SARS-CoV between humans and carnivores (Fig. 6 
and supplemental data at http://supramap.osu.edu/ 
cov). 

Evolution of a labile region of the SARS-CoV genome 

In all three isolate sampling regimes the first insertion 
of the 29-nucleotide region, CCTACTGGTTAC- 
CAACCTGAATGGAATAT, occurs phylogenetically 
basal to the clade exhibiting the earliest hosts shift 
among humans and carnivores. However, the result of 
whether this region covaries with host shifts is depen¬ 
dent on isolate sampling regime. 

Locus insertion and deletion among SARS-CoV from 
various hosts in the 83 isolate data set 

We present the phylogeny for 83 isolates found 
under unitary costs with tracing depicting the complex 
pattern of presence and absence of the 29-nucleotide 
region CCT ACT GGTT ACC A ACCT GA AT GG A A 
TAT (Fig. 1). The pattern of insertion and deletion of 
the 29-nucleotide region region includes four to eight 
insertions and zero to four deletions. However, two host 
shifts from human to carnivore occur in concert 
with insertions of the 29-nucleotide region (Fig. 4). 
Using Maddison’s (1990) concentrated changes 


test, we find statistically significant correlation between 
this 29-nucleotide region and host shifts (CCT = 
0.0123). 

Locus insertion and deletion among SARS-CoV in the 157 
isolate data set 

We optimized the presence of 29 nucleotide sequence 
regions CCT ACT GGTT ACC A ACCT G A AT GG A A- 
TAT and CCA AT AC ATT ACT ATTCGG ACT GGTT - 
TAT over the tree calculated for 157 isolates under 
unitary costs (Fig. 2). The region CCAATACATTAC- 
T ATTCGG ACT GGTTT AT occurs in all wholly 
sequenced genomes of SARS-CoV isolated from 
Chiroptera and is well correlated with this host. In 
contrast, the region CCT ACT GGTT ACC A ACCT - 
GAATGGAATAT is inserted seven to eight times and 
deleted four to five times. In terms of host use in this 
tree, there are five shifts from carnivore to human hosts 
and two changes from human to carnivore hosts 
(Fig. 5). Among all these changes in the presence of 
the 29-nucleotide region, CCT ACT GGTT ACC A A- 
CCTGAATGGAATAT, and changes in host use, there 
is only one branch where these two changes occur 
concurrently. This results in a CCT value of 0.108. Thus 
the CCT ACT GGTT ACC A ACCT G A AT GGA AT AT 
region shows insignificant correlation with the host shift 
in the 157 isolate data set. 

Locus insertion and deletion among SARS in the 114 
isolate data set 

We optimized the presence and absence of the 29- 
nucleotide regions CCTACTGGTTACCAACCTG 
A AT GGA AT AT and CCA AT AC ATT ACT ATT CG- 
GACTGGTTTAT, on a binary representation of strict 
consensus tree resulting from parsimony search of the 
114 isolate data set (Fig. 3). There are no branches 
where a host shift (Fig. 6) is coincident with an insertion 
or deletion of this fragment. This result indicates, that 
like the 157 isolate data set, the insertion of this 29- 
nucleotide region is not significantly correlated with a 
host shift. Moreover, just as in the 157 isolate dataset, 
the region, CCA AT AC ATT ACT ATTCGG ACT GGT - 
TTAT, occurs in all wholly sequenced genomes of 
SARS-CoV isolated from Chiroptera and is well 
correlated with this host. 

Mutations in the spike protein 

Li et al. (2005) interpret the distribution of states and 
polarity of change of position 479 of the SARS-CoV 
spike protein as follows. Viruses infecting carnivores 
contain a basic residue, arginine (R) or lysine (K). Next 
mutation to a small uncharged residue asparagine (N) 
allowed infection of humans. 



126 


D. Janies et al. / Cladistics 24 (2008) 111-130 


However, in the 157 isolate tree we see a different 
distribution of genotypes and polarities of change. 
SARS-CoV isolated from carnivores exhibit three 
genotypes at position 479: asparagine (N) arginine (R) 
or lysine (K). SARS-CoV infecting humans have two 
genotypes at position 479: asparagine (N) and arginine 
(R). SARS-CoV infecting Chiroptera contain exclu¬ 
sively serine (S) at position 479. SARS-CoV isolated 
from the artiodactyl contain asparagine (N). Consid¬ 
ering the tree in the 157 isolate data set, we observe the 
following mutations at in the spike protein: N479K, 
N479R, S479N, R479N (supplemental data at http:// 
supramap.osu.edu/cov). 

Li et al. (2005) also describe diversity and polarity of 
change for position 487 of the spike protein of SARS- 
CoV. They describe SARS-CoV isolated in 2002-03 to 
contain threonine (T) and SARS-CoV isolated from 
humans and carnivores in 2003-04 to contain serine (S) 
at position 487. 

We observe essentially the same diversity of genotypes 
at position 487 with some additions. SARS-CoV infect¬ 
ing Chiroptera contain primarily valine (V) at position 
487 with the exception of one isolate that contains an 
isoluceine (I). SARS-CoV isolated from the artiodactyl 
exhibits a threonine (T). However, we observe different 
polarities of change than those inferred by Li et al. 
(2005). We observe the muations: V487I, V487T, T487S 
based on the tree from the 157 isolate data set 
(supplemental data at http://supramap.osu.edu/cov). 

We found a statistically signifcant covariation of 
mutation T487S in the spike protein with carnivore 
hosts (Fig. 5 and supplemental data at http://super 
map.osu.edu/cov). The CCT is 0.019 with DELTRAN 
optimization and 0.018 with ACCTRAN optimization. 

We find no correlation of the mutations N479K and 
N479R in the spike protein with change from human to 
carnivore hosts (Fig. 5 and supplemental data at http:// 
supramap.osu.edu/cov) as there are no branches that 
share these mutations and a shift in host. 

Outgroup choice 

As presented in Figs 1-6 and supplemental figures at 
http://supermap.osu.edu/cov, we rooted our phyloge- 
nies on non-SARS coronaviruses. Due to the long 
internal branches (e.g., ranging from 1680 to 3332 steps 
in the 83 isolate data set) between any antigenic groups 
and SARS we decided to use this rooting only for 
visualization. 

The rooting we can present in a figure does not fully 
represent the extent of our analyses. Our tests as to 
whether our results were sensitive to outgroup choice 
showed that our results were not affected by outgroup 
choice. SARS-CoV isolates from human hosts were 
consistently basal to any SARS-CoV isolate from a 
carnivore host irrespective of outgroup choice. 


Discussion 

Based on the SARS-CoV data released as of July 
2006, the polarity of host shifts from human to 
carnivore hosts and humans to artiodactyl host is clear. 
Simply put, the SARS-CoV sequence data from animal 
hosts that has been released as of July 2006 are the 
results of two zoonotic events that occurred after the 
2002-03 outbreak of SARS in humans: one major shift 
from human to carnivore hosts (with subsequent rever¬ 
sals that were not significant to human outbreaks) and 
one shift to an artiodactyl. SARS-CoV isolated from 
Chiroptera are consistently basal to clades containing 
SARS-CoV from human, carnivore and artiodactyl 
hosts. 

Outgroup choice and presentation 

Many of the reports that argue for carnivores as the 
original reservoir of SARS-CoV use a phylogeny to 
support their arguments (Guan et al., 2003; Chinese 
SARS Molecular Epidemiology Consortium, 2004; Kan 
et al., 2005; Song et al., 2005; Zhang, C et al., 2006). 
However, the phylogenies in these studies lack outgroup 
and rooting criteria necessary to derive such evidence for 
the origins of SARS-CoV. Outgroups chosen from 
outside of SARS-CoV are necessary to test the mono- 
phyly of the SARS-CoV ingroup (Barriel and Tassy, 
1998). Moreover in optimal trees, non-SARS-CoV 
outgroups will join the region of the SARS-CoV subtree 
that is closest to the ancestor of SARS and provide a 
point suitable for rooting and subsequent character 
analysis (Grandcolas et al., 2004). 

In the case of Guan et al. [2003, see their figs 2 and 
S2) and the Chinese SARS Molecular Epidemiology 
Consortium (2004); see their fig. S7 of their supple¬ 
mental materials] these researchers simply force the 
root position on their drawings such that they repre¬ 
sent SARS-CoV isolates from animal hosts as ances¬ 
tral. In other drawings, no outgroup is designated 
(Chinese SARS Molecular Epidemiology Consortium, 
2004, fig. 2) or a human SARS-CoV outgroup is used 
and the animal SARS-CoV isolates are omitted from 
the tree (Chinese SARS Molecular Epidemiology 
Consortium, 2004, fig. S6). In the case of Song et al. 
(2005a) human SARS-CoV is designated as the out¬ 
group. Regression methods are used to construct a 
rooted tree in which the date of the most recent 
ancestor is reconstructed as December 2002 (Song 
et al., 2005). Song et al. (2005) conclude that a source 
of disease common to humans and civets must be in 
the environment and further surveys of the CoV in the 
Guangdong region are warranted. In the case of 
Zhang, C et al., 2006, fig. 1; and pers. comm.) an 
outgroup was used for tree construction but not for 
tests of selection. 
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Many researchers agree that SARS represents a 
previously unrecognized fourth lineage of coronaviruses 
(Marra et ah, 2003; Rest and Mindell, 2003; Rota et ah, 
2003). Thus, the non-SARS coronaviruses can serve as 
outgroups to SARS-CoV. This can be revisited if and 
when data on viruses closely related to SARS-CoV 
become available. Alternatively, other researchers used 
a torovirus and/or okavirus outgroup(s) to place SARS- 
CoV as sister to group 2 coronaviruses (Snijder et al., 
2003; Lio and Goldman, 2004). However, based on the 
data in GenBank, toroviruses and okaviruses bear little 
sequence similarity to any coronavirus. The danger in 
use of such distant outgroups is well documented 
(Wheeler, 1990; Graham et al., 2002). In essence, distant 
outgroups act as if they are random sequences resulting 
in spurious attraction to the longest branch available 
among the ingroup. Indeed the branch lengths between 
the major clades of coronaviruses in the 83 and 157 
isolate datasets of this paper are long. This problem is 
addressed in the 114 isolate data set. The best approach 
going forward is to extend sampling of diverse corona- 
virus genomes to search for outgroups of SARS-CoV in 
humans, especially from Chiroptera, carnivores and 
non-human primates. 

Taxonomic sampling affects analyses 

The lack of a good outgroup to SARS-CoV is tied to 
(1) poor sampling of non-SARS coronavirus genomes 
before the 2002-03 SARS outbreak, and (2) the preoc¬ 
cupation with animals in Chinese markets, farms and 
restaurants after the outbreak without regard to highly 
diverse species traded as bush meat in South-east Asia 
(Bell et al., 2004). Before the SARS epidemic, the small 
number of animal coronaviruses that had been se¬ 
quenced were selected primarily from animals of agri¬ 
cultural importance or model organisms. This lack of 
sampling of coronaviruses from wild animals is changing 
as viral surveys of Chiroptera, camelids and bovids are 
published and in preparation (Chu et al., 2006; Domin¬ 
guez et al., 2007; Jina et al., 2007; Zhang, X et al., 2007). 

Insertion of the 29-nucleotide regions 

Presence of the region CCTACTGGTTACCAACC- 
TGAATGGAATAT is correlated with host switching 
beween human and carnivore hosts in the 83 isolate data 
set but is insignificantly correlated with switches from 
human to carnivore hosts in the larger (114 and 157 
isolate) data sets. The concentrated changes test (CCT; 
Madison, 1990) whether a change in one character (e.g., 
insertion or deletion of the 29-nucleotide region) and a 
change in another character (e.g., host phenotype) co¬ 
occur on the same branches of a tree more often than 
expected by chance. In the case of the 83 isolate data set 
we observe a significant correlation between the 


presence of this 29-nucleotide region and carnivore 
hosts. In the case of the 157 isolate data set we observe 
an insignificant correlation. In the case of the 114 isolate 
data set we do not observe changes that strictly co¬ 
occur. However, we do observe that host shifts in the 
114 and 157 isolate data set that host shifts occur in the 
region of the tree in which changes in the 29-nucleotide 
region occurred more basally. Thus, the presence of the 
29-bp region may predispose or be part of a suite of 
genomic changes associated with host shifts. In light of 
these results, it is of interest to implement a relaxed 
concentrated changes test. This test could examine the 
branches in the vicinity of the change of interest for a 
correlated change in a second character. 

Mutations of the spike gene 

Our phylogenetic results shed fresh light on the 
polarity of mutations and diversity of genotypes in the 
spike protein of SARS-CoV. Our results differ from the 
result of Zhang, C et al. (2006) who using CODEML 
(Yang, 1997) and HYPY (Kosakovsky Pond and Frost, 
2005) for a tree-based spike nucleotide sequence analysis 
show that the codon for amino acid position 479 was 
under positive selection and the codon for amino acid 
position 487 was not. The trees used to derive these 
results reflect the same bias seen in other studies—that 
transmission of SARS-CoV was from carnivore to 
human hosts. 

Geographic visualization 

The pattern of geographic spread of SARS-CoV is 
similar to that of avian influenza (H5N1; Janies et al. 
2007) in that both viral lineages that have caused recent 
outbreaks have their origins in Southern China. How¬ 
ever, H5N1 and SARS-CoV contrast in the rapidity in 
which they moved across the planet. The recent 
outbreak lineage of H5N1 has spread from Asia to 
Europe, the Middle East, and Africa during the period 
of 1996-2005 and has not yet arrived in North America. 
In contrast, SARS-CoV spread not only from Asia to 
Europe but also North America in a matter of months 
(November 2002-March 2003). These differences are 
perhaps associated with the fact that SARS-CoV 
infected carnivores in urban markets and a cosmopol¬ 
itan human population with access to world travel. In 
contrast, H5N1 is currently infecting primarily avian 
populations and humans that live in rural settings and 
come into close contact with birds via subsistence 
farming and food processing. 

Further directions 

In order to better understand the molecular epidemi¬ 
ology of SARS-CoV we must develop research 
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programs that include comprehensive sampling and 
phylogenetic analyses of many whole viral genomes, 
including outgroups that are closely related to SARS- 
CoV. As a result of the previously unrecognized 
zoonotic threat they pose, several groups have em¬ 
barked on large-scale sequencing projects on coronavi- 
rus genomes isolated from diverse animal hosts, 
especially Chiroptera, carnivores and primates. These 
efforts will help us pinpoint the zoonotic origins of 
SARS-CoV, develop an understanding of the zoonotic 
potential of coronaviruses as well as the genomic 
changes that underlie host shifts among coronaviruses. 
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spike.aa.pos479.pdf. Phylogenetic tree of 157 corona- 
virus isolates based on whole genomes (sampling in 
Table 2). This is the same tree as Figs 2 and 5 in the 
body of the paper except that in this instance the 
amino acid states at position 479 in the spike locus 
are traced. 

spike.aa.pos487.pdf. Phylogenetic tree of 157 corona- 
virus isolates based on whole genomes (sampling in 
Table 2). This is the same tree as Figs 2 and 5 in the 
body of the paper except that in this instance the 
amino acid states at position 487 in the spike locus 
are traced. 

covll4.host.raxmltree929.names.pdf. RAXML search 
under GTRGAMMA for 114 isolates. Character 
optimization was conducted under equally weighted 
parsimony 


covl 14.host.raxmltree929boot.nex. Tree with boot¬ 
strap values for RAXML search. To be viewed with 
MESQUITE. 

rl000.covll4.jackknife.log. Jackknife values for 114 
isolate data set under equally weighted parsimony. To 
be viewed with a text editor. 

rl000.cov83.jackknife.log. Jackknife values for 83 
isolate data set under equally weighted parsimony. To 
be viewed with a text editor 

rl000.covl57.jackknife.log. Jackknife values for 157 
isolate data set under equally weighted parsimony. To 
be viewed with a text editor 
janiesetal2008covsars.kmz. Keyhole Markup file 
depicting the spread of 114 isolates of SARS-CoV over 
geography. To be opened with Google Earth. See also 
readmesarskml.pdf. 



